Strong Converse for 
Identification via Quantum Channels 
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Abstract — In this paper we present a simple proof of the 
strong converse for identification via discrete memoryless 
quantum channels, based on a novel covering lemma. The 
new method is a generalization to quantum communication 
channels of Ahlswede's recently discovered appoach to clas- 
sical channels. It involves a development of explicit large 
deviation estimates to the case of random variables taking 
values in selfadjoint operators on a Hilbert space. This the- 
ory is presented separately in an appendix, and we illustrate 
it by showing its application to quantum generalizations of 
classical hypergraph covering problems. 

Keywords — Identification, covering hypergraphs, quantum 
channels, large deviations. 



I. Introduction 

Ahlswede and Dueck j|| found the identification capac- 
ity of a discrete memoryless channel by establishing the 
optimal (second order) rate via a so-called soft converse. 
Subsequently, the strong converse, conjectured by them, 
was proved by Han and Verdu Even their second, 

simplified proof 12 uses rather involved arguments. 

In H it is shown how simple ideas regarding coverings 
of hypergraphs (formalized in lemma ||) can be used to 
obtain the approximations of output statistics needed in 
the converse. 

Formally, we investigate the following situation: consider 
a discrete memoryless channel W n : X n — ► y n (n > 1), 
i.e. for x n =xi...x n e X n , y n = y x . . . y n e y n 

W n (y n \x n )=W(y 1 \x l )---W(y n \x n ), 

with a channel W : X — > y which we identify with the 
DMC. It is well known |23| that the transmission capacity 
if this channel (with the strong converse proven by Wol- 
fowitz |§]) is 

C{W)= max I(P;W). 

P p.d. on X 

Here I(P; W) = H{PW) - H{W\P) is Shannon's mutual 
information, where PW — Ylxex P(x)W(-\x) is the output 
distribution on y, and H(W\P) = T, xeX P(x)H(W(-\x)) 
is the conditional entropy of the channel for the input dis- 
tribution P. 

Ahlwede and Dueck, considering not the problem that 
the receiver wants to recover a message ( transmission prob- 
lem) , but wants to decide whether or not the sent message 
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is identical to an arbitrarily chosen one (identification prob- 
lem), defined a (n, N, Ai , A2) identification (ID) code to be 
a collection of pairs 

{(P i ,V i ):i = l,... ,N}, 

with probability distributions Pi on X n and Dj C y n , such 
that the error probabilities of first resp. second kind satisfy 



P l W n (V c t )= J2 Pi{x n )W n {V?\x n ) < Ax, 
P J W n (V l )= Y, Pj{x n )W n {Vi\x n )<\ 2 , 



X™£X™ 

for all i,j = 1, . . . , N, i ^ j. Here V\ =y n \V t is the set 
complement of T>i in y n , and 

W n (A\x n ) = J2 W n (y n \x n ) 

y"£A 

is a convenient shortcut for the probability of an event 
A C y n conditional on x n . Define N(n, Xi, X2) to be the 
maximal N such that a (n, N, X\, A2) ID code exists. 
With these definitions one has 

Theorem 1 (Ahlswede, Dueck p|) For every Ai,A2 > 
and 5 > 0, and for every sufficiently large n 

N(n, Ai, A 2 ) > exp(exp(n(C(W0 - 5))). 



The work g is devoted to a comparably short, and con- 
ceptually simple proof of 

Theorem 2: Let Ai, A 2 > such that Ai + A 2 < 1. Then 
for every 5 > and every sufficiently large n 

N(n, Ai, A 2 ) < exp(exp(n(C(W0 + S))). 



Note that for Ai + A 2 > 1 no upper bound on N(n, Ai, A2) 
can hold: a successful strategy would be that the receiver 
ignores the actual signal, and to identify i guesses YES 
with probability 1 — Ai, NO with probability Ai > 1 — A2. 

The first proof of theorem^ was given in @, the method 
to be further extended in [Q. In || it is returned to the 
very first idea from j|, essentially to replace the distribu- 
tions Pi by uniform distributions on "small" subsets oi X n , 
namely with cardinality slightly above exp(nC(W)). 

Lober Jlc| began the study of identification via quantum 
channels. Following his work, and after Holevo p4| , we 
define a (discrete memoryless) classical-quantum channel 
(quantum channel for short) to be a map 



W : X 



s(n), 



2 



with X a finite set, as before, and S{TL) the set of quantum 
states of the complex Hilbert space TL, which we assume to 
be finite dimensional. In the sequel, we shall use a = \X\ 
and d = dimTL. We identify S(TL), as usual, with the set of 
density operators, i.e. the selfadjoint, positive semidefinite, 
linear operators on TL with unit traceQ: 

S(H) = {p : p = p* > 0, Trp = 1}. 

In the sequel we will write W x for the images W(x) of the 
channel map. 

Associated to W is the channel map on n-blocks 



with 



W n : X r 



W£ n = W X1 



5(^ n ), 



>W Xn 



One can use quantum channels to transmit classical infor- 
mation, and Holevo (T^| showed that the capacity is 

C(W)= max UP;W). 

P p.d. on X 

Here I(P; W) = H{PW) - H(W\P) is the von Neu- 
mann mutual information, with the output state PW = 
E x e X P (x)W x on TL, and H(W\P) = E xeX P(^)H(W X ) 
the conditional entropy of the channel for the input distri- 
bution P. The only difference to Shannon's result is that 
here H denotes the von Neumann entropy which is defined, 
for a state p, as 

H(p) = -Trplogp. 

The strong converse for this situation was proved (indepen- 
dently) in [|| and g|. 

Quantum channels are a generalization of classical chan- 
nels in the following sense: choose any orthonormal basis 
(e y : y € y) of the |3^| -dimensional Hilbert space TL, and 
define for the classical channel W : X — > y the correspond- 
ing quantum channel W : X — > S(TL) by 



W x = J2w(y\x)\e v )(e v \ 



Obviously for another channel V one has Vx W — V ® W. 

Regarding the decoding sets let T> C y, then the corre- 
sponding operator D = J2 y ev \ e y)( e v\ satisfies for all x 

W(V\x) = Tr(W x D). 

Observe that by this translation rule a partition of y corre- 
sponds to a projection valued measure (PVM) on Tt, i.e. a 
collection of mutually orthogonal projectors which sum to 
1. Conversely, given any operator D on TL with < D < 1, 
define the function 
Then for all x 



8 : y - [0,1] by 5{y) = (e y \D\e y ). 



Tr(W x D) = ^ S(y)W(y\x), 

1 See Davies for the mathematics to describe quantum systems. 



which implies that every quantum observation, i.e. a pos- 
itive operator valued measure (POVM), of the states W x 
can be simulated by a classical randomized decision rule on 
y. One consequence of this is that the transmission capac- 
ities of W and of W are equal: C(W) = C(W). Equally, 
also the identification capacities (whose definition in the 
quantum case is given below) coincide. For randomization 
at the decoder cannot improve either minimum error prob- 
ability. 

Abstractly, just given the states W x , this situation occurs 
if they pairwisc commute: for then they are simultaneously 
diagonalizable, hence the orthonormal basis (e y : y € y) 
arises. 

According to |lS} ] a (n, N, Ai, A2) quantum identification 
( QID ) code is a collection of pairs 

{(P i ,A):* = l,...,JV}, 

with probability distributions Pi on X n , and operators Di 
on 7i® n satisfying < Di < 1, such that the error proba- 
bilities of first resp. second kind satisfy 

Tr (PiW n (l - Di)) 



= Tr 
Tr (PjW n ■ Di) 
= Tr 



^ 2 Pi(x n )W x A (l-A)l < A l5 




Pj(x n )W xn A < A 2 , 



for all i,j — l,...,N,i^ j. Again, define N(n, Xi, X 2 ) 
to be the maximal N such that a (n, N, Ai, A2) QID code 
exists. 

This definition has a subtle problem: since the A need 
not commute, it is possible that identifying for a message i 
prohibits identification for j, as the corresponding POVMs 
(Aj 1 — A) and (Dj, 1 — Dj) may be incompatible. To 
allow simultaneous identification of all messages we have 
to assue that the Di have a common refinement, i.e. there 
exists a POVM (Ef. : k = 1,...,K) and subsets of 
{!,... , K} such that 



A — ^ Eki 



for all i. In this case the QID code is called simultaneous, 
and N s i m (n, Ai, A2) is the maximal N such that a simul- 
taneous (n, N, Ai,A2) quantum identification code exists. 
Clearly 

N Bim (n,\i,X 2 ) < N(n,X u X 2 ). 

In analogy to the above theorems it was proved: 

Theorem 3 (Lober Q) For every Ai, A 2 > and 6 > 0, 
and for every sufficiently large n 

N sim (n,X u X 2 ) > exp(exp(n(C(VF) - 5))). 

On the other hand, let Ai, A2 > such that Ai + A2 < 1. 
Then for every S > and every sufficiently large n 

N sim (n,X u X 2 ) < exp(exp(7i(C(VF) + 6))). 
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Looking at the examples given in the simultaneity con- 
dition seems completely natural. But this need not always 
be the case. 

Example 4 : Modify the "sailors' wives" situation (Ex- 
ample 1 form Q) as follows: the N sailors are not married 
each to one wife but instead are all in love with a single 
girl. One day in a storm one sailor drowns, and his identity 
should be communicated home. The girl however is capri- 
cious to the degree that it is impossible to predict who 
is her sweetheart at a given moment: when the message 
about the drowned sailor arrives, she will only ask for her 
present sweetheart, and only she will ask. 

With our present approach we can get rid of the simul- 
taneity condition in the converse (whereas by the above 
theorem identification codes approaching the capacity can 
be designed to be simultaneous — namely, by ||] for any 
sort of channel and a transmission code of rate R for it, 
one can construct an ID code "on top" of the transmission 
code, and with identification rate R, asymptotically): 

Theorem 5: Let Ai, A2 > such that Ai + A2 < 1. Then 
for every S > and every sufficiently large n 

N(n, Ai, A 2 ) < exp(exp(n(C(WO + 5))). 
The rest of the paper is divided into two major blocks: 
first, after a short review of the ideas from || in section |l[ 
the rest of the main text will be devoted to the proof of 
theorem |^ (as explained, this contains theorem |^ indeed as 
a special case) , in section B . 

The other block is the appendix, containing the funda- 
mentals of a theory of (selfadjoint) operator valued random 
variables. There the large deviation bounds to be used in 
the main text are derived. 

II. The classical case 

The core of the proof of theorem || in |3| is the following 
result about hypergraphs. Recall that a hypergraph is a 
pair r = (V,£) with a finite set V of vertices, and a finite 
set £ of (hyper-) edges E C V. We call T e-uniform, if all 
its edges have cardinality e. For an edge E e £ denote the 
characteristic function of E C V by 1e- 

The starting point is a result from large deviation theory: 

Lemma 6: For an i.i.d. sequence Z\, . . . , Zl of random 
variables with values in [0, 1] with expectation ¥,Zi = /i, 
and < e < 1 

Pr > (i + ^j <exp(-L£»((l + e) M ||/i)), 

Pr < (i-^j <exp(-LD((l-e)n\\n)), 

where D(a\\(3) is the information divergence of the binary 
distributions (a, 1 — a) and (/3, 1 — 0). Since for — \ < x < \ 
D((l + x)h\\ijl) > Tjy-jX 2 /!, it follows that 



Lemma 7: Let T — (V,£) be an e-uniform hypergraph, 
and P a probability distribution on £. Define the proba- 
bility distribution Q on V by 



and fix e,r > 0. Then there exist vertices Vo C V and 
edges Ei, . . . ,El £ £ such that with 

Q(«) = yE-W«) 
Li e 

the following holds: 

Q(V ) < r, 

V« S V \ Vo (1 - e)Q(v) < Q[v) < (1 + e)Q(v), 
l<1h |V| 21ia21og(2|V|) 

Proof: See |). ■ 
For ease of application we formulate a slightly more general 
version of this: 

Lemma 8: Let T — (V,£) be a hypergraph, with a mea- 
sure Qe on each edge E, such that Qe{v) < r\ for all E, 
v G E. For a probability distribution P on £ define 



Ee£ 



and fix e, t > O.Then there exist vertices Vo C V and edges 
Ei, . . . , El £ £ such that with 



1 L 



Prj^EW- (1 + OA*] I < 2 exp (-L- 



2 In 2 



the following holds: 

Q(V ) < T, 

V« S V \ V (1 - e)Q[v) < Q{v) < (1 + e)Q(v), 

■ 

The interpretation of this result is as follows: Q is the ex- 
pectation measure of the measures Q e , which are sampled 
by the Q £v . The lemma says how close the sampling aver- 
age Q can be to Q. In fact, assuming Qe(E) = q < 1 for 
all E £ £, one easily sees that 

IIQ-Qlli < 2e + 2r. 

The idea for the proof of theorem || is now: to replace 
the (in principle) arbitrary distributions Pj on X n of a 
(n, N, Ai, A 2 ) ID code {(Pi, A) : i = 1, ■ ■ ■ , N}, by uniform 
distributions on subsets of X n , with cardinality bounded 
essentially by exp(nC '(W)) . The condition is that the cor- 
responding output distributions are close, so the resulting 
ID code will be a bit worse, but still nontrivial. This is 
done with the help of the covering lemma [| applied to 
typical sequences in y n as vertices, and sets of induced 
typical sequences as edges. For details see B. 
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III. Proof of theorem |H] 

It was already pointed out in the previous section that 
the main idea of the converse proof is to replace the arbi- 
trary code distributions Pi by regularized approximations, 
the quality of approximation being measured by the || ■ ||i- 
distance of the output distributions. 

Hence, to extend the method to quantum channels we 
have to use the || • ||i-distance of the output quantum states, 
and we have to find quantum versions of the lemmas |^ 
and |g. 

Define a quantum hypergraph to be a pair (V,£) with a 
finite dimensional Hilbert space V and a (finite) collection 
£ of operators £ on V, < i? < 1 (sec the discussion in 
subsection F of the appendix) . 



Analogous to lemma |6| is theorem 19 (in the appendix), 
the analog of lemma ^ is 

Lemma 9: Let (V,£) be a quantum hypergraph such 
that E < rjl for all E S £. For a probability distribu- 
tion P on £ define 



P 



= P{E)E, 



and fix e, t > 0. Then there exists a subspace Vo < V and 
Ex,.. . ,El € £ such that with 



1 L 



i=l 

and denoting by ilo and IT the orthogonal projections onto 
Vo and its complement, respectively, the following holds: 

TrpHo < t, 

(i - e)nipiii < nipHi < (i + e)nipni, 

T <- 1 a. (A- ,n 21n21 °g(2dimV) 

L < 1 + ?/(dim V) t. . 

e T 

Proof: Diagonalize p, i.e. p — £^ fj^ji and let 

n = n i> n 1 = i-n . 

j: rj<rf dim V 

Clearly Tr(pII ) < r. 

Now consider the quantum hypergraph (Vq~, Ilifrii), 
whose edges also obey the upper bound 77 1, and which has 
edge average nipITi > rl/ dimV. 

Now let Xi,... ,X L i.i.d. with Pr{X, = n^nj = 
P{E). Then we can estimate with theorem |l^, applied 
to the variables rj~ 1 Xi. 



< 2(dimV)exp -L 



277 dimVln2y ' 



which is smaller than 1 if 



... „,21n21og(2dimV) 
L > 77(dimV) ^ i. 



in which case the desired covering exists. ■ 
For application, assume TrE = q < 1 for all E e £. Then 
a consequence of the estimates of the lemma is 

< (e + r) + V8(e + r). 

To see this observe 

Hp -p||i < Wp-Hxp^Wi + wn^-UxpiixW! 
+ ||p-n 1 pn 1 ||i 

< r + e + v / 8(eTr), 

where the three terms are estimated as follows: for the first 
note p — IT /TIi = IIo/oIIo, and apply Tr/TIo < r. For the 
second use the lemma, and for the third use TrpTIo < e + r 
in lemma V.9 of |||]. 

We collect here a number of standard facts about types 
and typical sequences (cf. |6|): 

Empirical distributions (aka types): For a probability dis- 
tribution P on X define 

T P " = {x n e X n : Vx E X N(x\x n ) = nP(x)}, 

where N(x\x n ) counts the number of x's in x n . If this set is 
nonempty, we call P an n-distribution, or type, or empirical 
distribution. Notice that the number of types is 



n + a — 1 
a- 1 



< {n + l) a . 



Typical sequences: For a > and any distribution P on X 
define the following set of typical sequences: 



7~n 
P r 



x n :Va; \N{x\x n ) - nP{x)\ 



U 



T" 



Qs.t. \Q( X )- P ( X )\<y/ £i±=n 

Note that by Chebyshev inequality P® n {T£ a ) > 1 - ^. 

From (25j recall the following facts about the quantum 
version of the previous constructions: 
Typical subspace: For p on Ti. and a > there exists an 
orthogonal subspace projector II™ a commuting with p® n , 
and satisfying 



Tr(p®™II£ ) > 1 ^ ' 

Trn" Q < exp(nff (p) + Kda^), 
K-. a P® n K, a ^ exp(-ni/(p) - Xda^)n;, ( 



(1) 



Conditional typical subspace: For x™ G TJ? and a > there 
exists an orthogonal subspace projector 11™^ (a;™) commut- 



ing with W" n , and satisfying 



ad 



Tr(^„n^ Q (x"))>l--, (2) 

a 

Trn^ Q (x") < exp{nH(W\P) + Kdaa^), 
^w, a (x n )W^U^ >a (x n ) < exp(-nH(W\P) + Kdaa^)- 



Tr(W^n n pWa ^) > 1 



ad 



(3) 



Proof of theorem ||: We follow the strategy of the 
proof for theorem ||: consider a (n, N, X%, X2) QID code 
{(Pi, Di) : i = 1, . . . , iV}, Ai + A2 = 1 — A < 1, and concen- 
trate on one Pi for the moment. Introduce, for empirical 
distributions T on X , the probability distributions 

P?(x n ) = for z« G T T », 

extended by to A"". For x™ e Xj? and with 

V600ad 

construct the conditional typical projector IV^ /a (x n ), and 
the typical projector ^w,a^a- 
Define the operators 

Qx™ = nj^ Q ^n^ a (x™)W'™„n^ Q (a; n )ri^ 14 , Q ^ : , 



and note that 



\Q* 



< 



6' 



by equations (2) and (3), and lemma V.9 of | f25| . 

Now we apply lemma ^| with e = r = A 2 / 1200 to the 
quantum hypergraph with the range of ^^ Wa ^ as vertex 
space and edges 



T ■ 



^TW,a^^W,a( xn )^x^W,a( xn )^TW,a^/E' 

Combining we get a L-distribution P? with 



\\pIQ-pIQ\V<\, 

L < exp(n/(T; W) + 0(y/n)) < exp(nC{W) + 0{y/n)), 

where the constants depend explicitly on a, 6, r. By con- 
struction we get 



\pTw v 



P?W n h 



A 

< -. 

~ 3 



By the proof of lemma ^ we can choose L = exp(nC(W) + 
0(y/n)), independent of i and T. 

Now choose a iC-distribution R on the set of all empirical 
distributions such that 



E 



T cmp. distr. 



\Pi{Tp)-R{T)\<±, 



which is possible for 

K=\2,{n + l)\ x \/X\. 

Defining then 

T cmp. distr. 

we deduce finally 

i||/w n -iwii < ^ 

Since for every operator D on H® n , < D < 1 

\Tr(P i W n -D) - Tr{PiW n -D)\ < ^\\P t W n - P l W n \\ 1 

the collection {(Pi,Di) : i = 1,... , N} is indeed a 
(n, N, Ai + A/3, A 2 + A/3) QID code. 

The proof is concluded by two observations: because of 
Ai + A2 + 2A/3 < 1 we have Pi ^ Pj for i ^ j. Since the 
Pi however are ifL-distributions, we find 



N < \X 7 



\KL 



exp(nlog|A-| • KL) 



< exp(exp(n(C(W) + 5))), 

the last if only n is large enough. ■ 
We note that we actually proved the upper bound C(W) 
to the resolution of a discrete memoryless quantum chan- 
nel, in the following sense: 

Definition 10: Let W be any quantum channel, i.e. a 
family W = (W 1 , W 2 , . . . ) of maps 

W n : X n -> S(H® n ). 

A number R is called e-achievable resolution rate if for all 
6 > there is no such that for all n > hq and all probability 
distributions P n on X n there is an M-distribution Q n on 
X n with the properties 

M < exp(n(R + 6)) and \\P n W n - Q n W n \\i < e. 

Define 

S e — inf{i? : R is e-achievable resolution rate}, 
the channel's e-resolution. 

Observe that this goes beyond the definition of , where 
the resolution was a function of a measurement process E: 

Definition 11 (Lober @) Let E = (E 1 , E 2 , . . . ) be a se- 
quence of POVMs E n on H®", and adopt the notations of 
definition [l^. A number R is called e-achievable resolu- 
tion rate for E if for all 8 > there is no such that for all 
n > uq and all probability distributions P n on X n there is 
an M-distribution Q n on X n with the properties 

M < exp(n(R + 8)) and d E ™ (P n W n , Q n W n ) < e, 

where dE(p,o~) is the total variational distance of the two 
output distributions generated by applying E to states p, 
a, respectively. 
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Define 

S e (E) = inf{i? is e-achievable resolution rate for E}, 

the channel's (-resolution for E. 
In general 

S e (E) < S e . 

We do not know if there is an example of a channel such 
that 

supS* e (E) < S e . 

E 
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Appendix 
Operator valued random variables 

A. Introduction 

The theory of real random variables provides the frame- 
work of much of modern probability theory, such as laws of 
large numbers, limit theorems, and probability estimates 
for 'deviations', when sums of independent random vari- 
ables are involved. However several authors have started 
to develop analogous theories for the case that the alge- 
braic structure of the reals is substituted by more general 
structures such as groups, vector spaces, etc., see for ex- 
ample jnj. 

In the present work we focus on a structure that has vital 
interest in quantum probability theory, namely the algebra 
of operators on a (complex) Hilbcrt space, and in particular 
the real vector space of selfadjoint operators therein which 
can be regarded as a partially ordered generalization of the 
reals (as embedded in the complex numbers). In particu- 
lar it makes sense to discuss probability estimates as the 
Markov and Chebyshev inequality (subsection C), and in 
fact one can even generalize the exponentially good esti- 
mates for large deviations by the so-called Bernstein trick 
which yield the famous Chernoff bounds (subsection D). 

Otherwise the plan of this appendix is as follows: sub- 
section B collects basic definitions and notation we employ, 
and some facts from the theory of operator and trace in- 
equalities, after the central subsections C and D we collect 
a number of plausible conjectures (subsection E), and close 
with an application to the noncommutative generalization 
of the covering problem for hypergraphs, in subsection F. 

B. Basic facts and definitions 

We will study random variables X : — > 2l s , where 
2l s = {A € 21 : A = A*} is the selfadjoint part of the 
C*-algebra 21, which is a real vector space. Usually we 
will restrict our attention to the most interesting and in 
a sense generic case of the full operator algebra £(7i) of 
the complex Hilbert space TL. Throughout the paper we 



denote d = dim7i, which we assume to be finite. In the 
general case d = Trl, and 21 can be embedded into £(C d ) 
as an algebra, preserving the trace. 

The real cone 21+ = {A G 21 : A = A* > 0} induces 
a partial order < in 2l s , which will be the main object of 
interest in what follows. Let us introduce some convenient 
notation: for A, B G 2l s the closed interval [A, B] is defined 
as 

[A, B] = {X G 2l s : A < X < B}. 

(Similarly open and halfopen intervals (A, B), [A, B), etc.). 

For simplicity we will assume that the space fi on which 
the random variables live is discrete. 

Some remarks on the operator order: 

(A) < is not a total order unless 21 = C, in which case 
2l s = R. Thus in this case (which we will refer to as the 
classical case) the theory developed below reduces to the 
study of real random variables. 

(B) A > is equivalent to saying that all eigenvalues of A 
are nonnegative. These are d nonlinear inequalities. How- 
ever from the alternative characterization 

A > <^=> V/9 density operator Tr(pA) > 

W one-dim. projector Tr(jrA) > 

we see that this is equivalent to infinitely many linear 
inequalities, which is better adapted to the vector space 
structure of 2l s . 

(C) The operator mappings A i— > A 3 (for s G [0,1]) and 
A i— ► log A are defined on 21+ , and both are operator mono- 
tone and operator concave. 

In contrast, the mappings A ^ A s (for s > 2) and 
A i ► exp A are neither operator monotone nor operator 
convex. Interestingly, A s for s G [1,2] is operator convex 
(though not operator monotone). 

All this follows from Lowner's theorem [|19f| , a good account 
of which is given in Donoghue's book (8|, and a character- 
ization of operator convex functions due to Hansen and 
Pedersen p3| . 

(D) Note however that the mapping A t— > Tr exp A is 
monotone and convex: see Lieb Jl7[ ] . 

(E) Golden-Thompson-inequality (§, @): for A, B G 
2l s 

Trexp(A + 5) < Tr ((exp A) (exp 5)) . 

C. Markov and Chebyshev inequality 

Theorem 12 (Markov inequality) Let X a random vari- 
able with values in 21+ and expectation M = EX = 
J2 X Pr{X = x}x, and A > (i.e. A G £(tt)+). Then 

Pr{X £A}< Tr (MA -1 ) . 
Proof: We may assume that the support of A contains 
the support of M, otherwise the theorem is trivial. 
Consider the positive random variable 

Y = A- 1/2 XA- 1/2 : 

which has expectation EY = N = A~ 1/2 M A~ 1/2 . Since 
the events {X < A} and {Y < 1} coincide we have to show 
that 

Pr{Y £ 1} < TrA. 
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This is seen as follows 
N 



]T Pr{y = y}y > £ Pr{F = y}y. 

y y£l 



Taking traces, and observing that a positive operator which 
is not less than or equal 1 must have trace at least 1, we 
find 

Triv > PT i Y = v} Tt v 

y£l 

> ^Pr{r = y} = Pr{y ^ 1}, 

which is what we wanted. ■ 
Remark 13: In the case of H — C the theorem reduces to 
the well known Markov inequality for nonnegative real ran- 
dom variables. One can easily see that like in this classical 
case the inequality of the theorem is optimal in the sense 
that there are examples when it is assumed with equality. 
If we assume knowledge about the second moment of X we 
can prove 

Theorem H (Chebyshev inequality) Let X a random 
variable with values in 2l s , expectation M = EX, and vari- 
ance VarX = S 2 = E ({X - M) 2 ) = E(X 2 ) - M 2 . For 
A > 

Pr{|X - M\ £ A} < Tr (S^A^ 2 ) . 
Proof: Observing 

\X - M\ < A <= (X - M) 2 < A 2 

(because \f is operator monotone, see section B, (C)) we 
find 

Pv{\X - M\ £ A} < Py{(X - M) 2 £ A 2 } 
< Tr (S^A" 2 ) 

(by theorem |T^). ■ 
Remark 15: If X, Y are independent, then V&r(X + Y) = 

VarX +VarY . The calculation is the same as in the classical 

case, but one has to take care of the noncommutativity. 
Corollary 16 (Weak law of large numbers) Let X, X\, 

. . . , X n i.i.d. random variables with EX = M, VarX = S 2 , 

and A > 0. Then 

Pr | ~ Xi $ t M - A, M + A] | < i-Tr (S 2 A~ 2 ) , 

Pr ^J2 X i & \- nM ~ A V»; nM + A Vn\ j < Tr (S 2 A~ 2 ) . 

Proof: Observe that Y ^ [M— A, M + A] is equivalent 
to \Y — M\ A, and apply the previous theorem. ■ 

D. Large deviations and Bernstein trick 

Lemma 17: For a random variable Y, B S 2l s , and T S 
21 such that T*T > 

Pr{Y £ B} < Tr (Eexp(TYT* - TBT*)) . 



Proof: A direct calculation: 

Pr{Y £ B} = Pr{Y - B £ 0} 

= Pt{TYT* - TBT* £ 0} 

= Pr{exp(TYT* - TBT*) £ 1} 

< Tr (E exp (TIT* - TBT*)) . 

Here the second line is because the mapping X i— > TXT* 
is bijective and preserves the order, the third because for 
commuting operators A, B ', A < B is equivalent to exp A < 
exp B, and the last line by theorem ' 
Theorem 18: Let X, Xi, . . . ,X n i.i.d. random variables 
with values in 2t s , A <E 2t s . Then for T e 21, T*T > 

Prj^X, ^ n^j < d- ||Eexp (TXT* — TAT*)\\ n . 

Proof: Using the previous theorem with Y = X)"=i X i 
and B = nA we find 



. »=i 



Pr <{ ^ Xi £ nA ^ < Tr |^Ecxp \JT^ T ( X * - A ) T * 
ETrexp (j>2T(X % - A)T*^j 

/n-l " 

exp ^ T(X - A)T* 



< ETr 



Ei...„_ilr 



exp(T(X„-^l)T*) 



exp T ( X * ~ A ^ T * 

\i=l 



•Eexp (T(X„ — A)T* 

< ||Eexp(T(X„-A)T*)|| • 

(n-l 
]T T(X 4 - A)T* 
i=i 

< ... < d- ||Eexp(T(X„ - A)T*) ||" 



Here everything is straightforward, except for the third line 
which is by the Golden-Thompson-inequality (section B, 

m ' ■ 

The problem is now to minimize ||Eexp (TXT* — TAT*)\\ 
with respect to T. Observe that without loss of generality 
we may assume that T is selfadjoint, because of the polar 
decomposition T = U ■ \T\, with a unitary U. The case we 
will pursue further is that of a bounded random variable. 
Introducing the binary I-divergence 

D(u\\v) = w(logw — log v) + (1 — u) (log(l — u) — log(l — v)) 



we find 

Theorem 19 (Chernoff) LetX,Xi, 
variables with values in [0, 1] C 2t s , EX < ml, A > at, 



, X n i.i.d. random 
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1 > a > m > 0. Then 

Pr |^ Xi £ rtA j < d ■ exp (-nD(a||m)) . 
Similarly, if EX > ml, A < al, < a < m < 1. Then 

Pr |^ Xi ^ j < d • cxp (-n£>(a||m)) . 
As a consequence we get, for EX = M > /il and < e < ^, 

Pr |^E x ^[( 1 - £ ) M '( 1+e ) M ]} 



< 2d- exp —n 



21n2 



Proof: The second part follows from the first by con- 
sidering Yi = 1 — Xi, and the observation that D(a\\m) = 
D(l-a\\l-m). 

To prove it we apply theorem [l^ with T = y/tl: 

Pr | X t £ nA 1 < Pr j ^ X £ nal 1 



Now using 



< d- ||Eexp(«f)exp(-ta)|| 



exp(tX) - 1 < X(exp(t) - 1) 



(which follows from the validity of the estimate for real x, 
x G (0,1): 

exp(ta) — 1 exp(i) — 1 
x " 1 ' 
which in turn is just the convexity of exp) we find 

Eexp(tX) < 1 + EX(exp(<) - 1) 
< (1 — to + to exp f)l. 

Hence 

||Eexp(£X) exp(— ta)\\ < (1 — m + to exp t) exp(— at), 
and choosing 

(a 1 — m\ 

t = log - • > 

y to 1 — a J 

the right hand side becomes exactly exp (— D(a\\m)). 

To prove the last claim of the theorem consider the vari- 
ables Yi — ^M~ 1 / 2 X i M -1 / 2 with expectation EY t = /j, and 
Yi G [0, 1], by hypotheses. Because of 

1 " 

-VXe[(l-e)Af,(l + e)M] 
n 

i=l 

1 " 

~y, Y i e [(l-e)Ml,(l-e)Ml] 
n L — ' 

i=i 



we can apply what we just proved to obtain 

Prj^X£[(l-e)M,(l + e )M]j 

< d[exp{-nD((l - + exp(-nD((l + e)/i||ju))] 



< 2c?- exp — ri 



2 In 2 



the last line by the already used inequality D((l+x)fi\\fi) > 

E. Conjectures 

We have the feeling that in the estimates of the previous 
section we waste too much. In particular the theorems 
become useless in the infinite dimensional case, because in 
the traces we could only account for the supremum of the 
involved eigenvalues, multiplied by the dimension of the 
underlying space. 

Conjecture 20: Under the assumptions of theorem [l^ it 
even holds that 

Pr |^ X % £ Tiylj < Tr [(Eexp (TXT* - TAT*)) n ] , 

since we conjecture that for i.i.d. random variables Z, 
Z\, ■ ■ ■ , Z n 6 21s 

TrEcxp Z ij ^ Tr (( Ecx P(Z)n • 

Note that this is indeed true for n — 2, thanks to the 
Golden-Thompson inequality! 

For larger n there seems to be no applicable general- 
ization of the Golden-Thompson inequality, so a different 
approach is needed. We propose to take logarithms in the 
above conjecture instead of traces: by the monotonicity of 
Tr exp A the conjecture is true if 

logEexp^^Z^ < nlogEcxp(Z). 

Thus by induction and monotonicity of TrexpA we can 
indeed prove conjecture |2(] if the following is true: 

Conjecture 21: For finite families of selfadjoint operators 
Aj and £>, 



log ^exp^ + Bj) < 



log eX P Ai ) + l0 S E eXP B 3 



Note that if all Ai, Bj commute then equality holds! 

It may be that taking this for granted one can prove the 

following conjecture (compare with theorem H9): 
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Conjecture 22: Let X, X\,... ,X n i.i.d. random vari- 
ables with values in [0, 1], EX < M < A < 1. Then 

Pr j^A; ^ nA^j < Trexp(-nX>(A||Af)), 

where T> is the operator version of the binary I-divergence: 
V(A\\M) = VA(log A- log M)VA 



+ y/l-A (log(l - A) - log(l - M)) VI -A. 
Again, given conjecture it would suffice to compare 
logEexp(TAT* — TAT*) and V{A\\M), for a clever choice 
of T. 

F. An application 

In this last subsection we want to discuss one applica- 
tion of our estimates in "noncommutative combinatorics" , 
namely as a tool in applying the probabilistic method to the 
noncommutative analogue of covering hypergraphs. Apart 
from the application in the main text, we would like to 
point out two other ones: an approximation problem in 
quantum estimation theory and its generalization to 
asymptotic convex decompositions of POVMs 26 1. 



F.l Noncommutative hypergraphs 

We will define noncommutative hypergraphs as gener- 
alizations of the usual ones. To understand the following 
definition one has to recall the correspondence between a 
compact space X and the C*-algebra C(X) of its continu- 
ous C-valued functions, provided by the Gelfand-Naimark 
theorem (see ||, ch. 2.3). In the case of a finite discrete 
set this is summarized in the fact that the positive idempo- 
tents of the function algebra are exactly the characteristic 
functions of subsets. Thus we can talk about hypergraphs 
(V,£) — V is the finite vertex set, and £ C 2 V the set of 
hyperedges (or edges for short) — in the language of finite 
dimensional commutative C*-algebras and certain of their 
idempotents. 

A noncommutative hypergraph T is a pair (03, £) with 
a finite dimensional C* -algebra 03 and a set £ C [0, 1] 
(usually finite). We call T strict if all elements of £ are 
idempotents. Finally T is a quantum hypergraph if 03 is 
the full operator algebra of a finite dimensional complex 
Hilbert spave V, in which case we denote T as (V, £). From 
the theory of finite dimensional C* -algebras it is known 
that 03 can be embedded into the full operator algebra 
of a Hilbert space of dimension Trl, preserving the trace. 
Thus we will in the sequel always assume that we deal with 
quantum hypergraphs. 

For a finite edge set £ the degree is defined as the oper- 
ator 

deg £ = J2 E - 

A covering of T = (V,£) is a finite family C of edges such 
that deg c > 1. 



F.2 Covering theorems 

Now we come to our first covering theorem (for the clas- 
sical case compare Jl[): 

Theorem 23: Let V a quantum hypergraph with 

deg r > 51. 

Then there exists a covering of T with k < 1 + *1£U±1 \ gd 
many edges. 

This is the special case of the uniform distribution in the 
following 

Theorem 24: Let T a quantum hypergraph and P a prob- 
ability distribution on £, such that 

Then there exists a covering of T with k < 1 + 
8(ln 2 log d)^i -1 many edges. 

Proof: Draw edges at random, i.e. consider i.i.d. ran- 
dom variables X,X 1 ,... ,X k with Pr{X = E} = P(E). 
Then we obtain, using theorem [l^: 

Pr{£A^l)=Pr{]Tx^fc.Il) 



< <iexp ( — k ■ D ( — \\fj, 



< dexp ( — k ■ D ( — 



< dexp — k ■ 



;ln2 r 



the third line only if we have k > 2/j, 1 . Now the last 
expression is smaller than 1 for 

k > 8 (In 2 log cf)^ 1 , 

justifying ex post our estimates. Hence for k as in the 
theorem there exists a covering with k edges. ■ 
We apply this result to a generalization of a result on 
covering numbers of hypergraphs, due to Posner and 
McEliece j2|], obtained independently, but a little bit later, 
by Ahlswede and reported in [Q. For a quantum hyper- 
graph T = (V,£) define T n = (V® n ,£ n ), with 



£ n = {E n = E 1 



• E n : Ei, . . . , E n G £}. 



We are interested in the covering number c(n) of T™, i.e. the 
minimum cardinality of a covering of T n . Finally define 



c(n) = min J v ( En ) : v > °> v ( En ) E ' n 

{ E n E n 



> 1 



(The v can be seen as a continous weight version of cover- 
ings, and will be called generalized coverings). It is imme- 
diate that c(n) > c(n). 
Theorem 25: With 



C 



log 



max 

P p.d. on £ 



min P(E)E 



E<E£ 
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where the min means the minimal eigenvalue, one has 



which implies 



c(n) > exp(Cn), 

c(n) < 1 + 8(ln21ogd) • nexp(Cn). 
In particular 

lim — logc(n) = lim — log c(n) = C. 

n — >oo Ti n — >oo Jl 

Proof: The second estimate follows by applying the- 
orem with the distribution P® n . 

The first is proved by induction on n. The case n = is 
trivial, and the case n = 1 is seen as follows: let v* be a 
minimal weight generalized covering of T, i.e. 

c(i) = J2 v *( e ) and J2 V *^ E - L 

E E 

What we have to show is that 

c(l) > fmaxrriin^P(E)£J^ , 

which means we have to find a distribution P such that 
sum E P(E)E > 5(1) _1 1. 

With P(E) = u*(-E)c(l) _1 this is obviously satisfied. 

Now assume n > 0, and let w* a minimal weight gener- 
alized covering of T™. Define a probability distribution Q 
on <? by 

^) = -TT E ^")- 
c(n) ^-^ 

V ) E"££ n ,E n =E 

Multiplying the relation 

^v*{E n )E n > l® n 

E" 

by J'SK™- 1 ) (gi 7r (for a one-dimensional projector n on V) 
from both sides and taking the trace over the last factor 
we find 

X®(n-i) < Tr(TrE n ) ^2 v*{E n )E n - 1 

= E ( E M*E n )v*(E n )\ E n ~\ 

This means that we have a generalized covering of r™ -1 
and hence 



min Q( E ) E > 



;(„_!) < J2 Tr(^„) £ /(£") 
= E Tr(7rS„)Q(£;„)5(n). 



S„6£ 



Thus for all 7r 



E Tr( 7 r£;)Q( J E;) 



> 



c(n - 1) 
c(n) ' 



E&S 

which in turn implies 



c(n — I) 
c(n) 
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